What is your corpus, why did you choose it, and what do you think is interesting about it?
Last.fm is an online music database, a music recommender system, and a social networking service, which was founded in the days when MSN, Myspace, and Runescape were still a thing. In general, the website offers a plugin for you to install on your PC and phone, which can track your listening behaviour. One listen, a scrobble, is then transferred (or “scrobbled”) to the database and displayed on your personal profile. Based on the collected data, it could also recommend you new music to discover or connect you to people with similar music taste. Although the social aspects have been watered down, I’ve still been using their service ever since June 2011 (my profile). With a vast amount of data up for grabs, it would be a waste to leave the data as it is. That is why I’m interested in learning more about how my listening has changed over the years.
As of December 31st of 2020, I have approximately over 97.000 registered scrobbles and 24.000 unique tracks over the course of ten years. The size is too big for the scope of this course, so I will be limiting to a set number of top tracks each year. This makes it easier to explore the data without losing much overview of my general listening behaviour.
What are the natural groups or comparison points in your corpus and what is expected between them?
My corpus will be divided in years from June 2011 to the end of 2020. According to a NYTimes-article, our musical taste is established during our (formative) teenage years. If that is the case, I’d expect that a certain music style from my teenage years would show up throughout my corpus. Apart from that, I still expect changes in my music preferences, as I grow older.
How representative are the tracks in your corpus for the groups you want to compare?
I used Spotlistr and Soundiiz to transfer 60 tracks per year from my last.fm profile to Spotify. As I’ve been listening to albums more than separate songs at some point in life, I decided to grab top 10 tracks and the remaining 50 tracks between #11-100 at random to broaden the scope. Sometimes the tool didn’t pick the correct track due to changes in the metadata, for which I adjusted manually. Examples include band name changes, such as ‘Viet Cong’ to ‘Preoccupations’ and ‘Andrew Jackson Jihad’ to ‘AJJ’. If a top 10 song was missing, then the next song was selected (#11 and so on). I also removed tracks that are considered as intros or interludes.
My corpus comes with a few limitations:
I’ve only started using the last.fm plugin on my smartphone since February 2015. Before then, I relied on my PC/laptop to log my scrobbles, which makes the data between 2011 and 2014 less accurate.
Possible under-representation of certain music styles in my 60 track selections. As a simple example: song 1 of style A has 100 scrobbles, song 2 and 3 of style B have 60 each. In total, style A has 100 scrobbles, whereas style B has 120, which is more than style A.
On a handful occasions, I fell asleep with my music and last.fm still on.
Identify several tracks in your corpus that are either extremely typical or atypical
Typical songs:
Processed by the Boys - Protomartyr: Musical styles (post-punk) from songs like this one have dominated my corpus since 2011.
Ferrum - Chihei Hatakeyama: Typical music (ambient) I listen to when I need to focus during work or study (especially since university).
Atypical songs:
Setsuyakuka - Tricot: Classified as math rock, a genre which gained prominence in my corpus starting from 2016.
Rosebud - U.S. Girls: I don’t listen to a lot of music that is considered ‘pop’. A track such as this one, however, stands high in my last.fm charts.
Gruppa Krovi - Kino: Alongside Tricot the only two only two non-English singing music groups in my all-time top 10.
Logo of last.fm
View my corpus per year:
The interactive barplot shows how many scrobbles were recorded from June 11th, 2011 to December 31st, 2020, including the totals at the top of each bar. Zooming in on the bottom of the bars, you can find the top 5 most listened artists of each year. Also included are events that could have influenced my listening behaviour.
Looking at all my recorded scrobbles from start to finish, you can observe that it increased significantly in 2015, after which it peaked in 2016 at 17773. As I mentioned in the previous slide, this increase can be explained by the fact that I started using the last.fm plugin on my phone, after I had purchased one which supported the plugin (from 249 in December, 2014 to 1635 in January, 2015). Another possible reason for the increase is that it marked a new chapter as a university student. It was during this time that I spent more time listening to music during study sessions, and made effort to explore new music by spending (too much) on live music events.
From here on out, scrobbles declined continuously down to 4492 in 2020. It seems that the COVID-19 pandemic had an influence on my scrobbles, as the 6-month average following the lockdown in March was around 375, whereas the average was 540 six months prior.
On a surface level, other interesting developments can be observed when looking at the top 5 most listened artists over the years. In 2011 and 2012, my top 5s were dominated by UK artists (9 out of 10). The top 5s of 2013/2014 diversified with artists coming from other places than the UK (4/10). My exposure to music produced in East Asia has had a noticeable effect on my top 5s starting from late 2018. This kind of music was so dominant that it overtook the entire top 5 in 2019 and most of 2018 and 2020.
Spotify offers a number of track-level features, which are used to characterize tracks that are available on their platform. The plot on the left shows how these features have developed over time by calculating the means of the selected 60 tracks per year.
It looks like speechiness and liveness did not change much. The former reflects the fact that I am (proportionally) not much of a rap/hip-hop listener, as this type of music requires values between 0.33 and 0.66. Valence rose slightly early on, but stabilized around the middle at 0.5, whereas energy experienced a noticiable shift upwards in 2016. More on this will follow in the next slide.
Acousticness increased steadily in the early years, but declined after 2014 and did not reach higher than 0.2 since then. One feature seems to have increased permanently, which is instrumentalness, hovering mostly above 0.3 since 2015. This change could be explained by a growth in listening to instrumental music, such as Toe and GoGo Penguin. Another music genre that could’ve influenced the increased instrumental levels around 2015 is ambient(Klara Lewis, Ryuichi Sakamoto), as this was my preferred genre during study sessions. Also, the introduction of this genre could have pushed the acousticness levels further down since 2015.
Spotify’s energy and valence translated into four moods
The ‘moods’ of each track in every year can be presented by plotting the energy feature against valence. Instrumentalness (size) and mode (colour) are displayed in the graphs as well. The plot is made interactive, enabling you to zoom in on individual tracks.
Overall, most songs in every year skew towards high energy levels (> 5.0), especially between 2016 and 2020. Valence on the other hand shows a wider range, spanning across the entire spectrum. From this can be concluded that the music I listened to in the past ten years are mostly happy or angry in general.
An interesting observation is that my listening habits gradually became ‘sadder’ from 2012 to 2014. What’s also striking is that my tracks rarely go deep into the ‘relaxed’ quadrant with the exception of one song in 2015 (Jùhachi). Looking at the size of each point, you can clearly see that the instrumentalness increased substantially from 2015 as mentioned earlier.
Based on these three visualizations, my logs can be categorized into (give or take) four periods:
It seems that there’s one particlar genre in my corpus that keeps coming back throughout the years, namely post-punk. In the corpus, this genre is represented by Joy Division (Mid-VWO), Kino (Late-VWO), Preoccupations (UvA), and Protomartyr (UvA/Post-HK). I assume that this is what the NYTimes-writer meant with musical taste peaking as teens. Therefore, I will be examining the most listened song of each band in the following slides.
Chromagrams
Outlier: Song’s very reliant on silence with occasional …
Things to consider: Most listened song vs favourite song of 2020. Compare multiple low valence/energy songs and see if they have something in common.
Song by Chihei Hatakeyama is interesting, as it mostly dominated by major notes.
Self-Similarity Matrices
[W08: Draft]
On the left, you see two self-similarity matrices displaying chroma and timbre features of Processed by the Boys by Protomartyr (top track of 2020). As for the settings:segments are set in bars, applied normalization and summary statistics are euclidean and root mean square, respectively. The darker the colour brightness, the more similar the segments are compared to segments before it in time.
Findings:
…
…
Things to consider: Compare above to top track of years 2016, 2018 (Rosebud) or other/more years.
[W09: Draft]
With the previous songs in mind, I have created key- and chordograms of top tracks from 2018 (Rosebud by U.S. Girls) and 2020 (Processed by the Boys by Protomartyr).
The level of prominence of each key/chord is represent by a darker colour.
I used “norm = manhattan; distance = manhattan” for both chordograms, and “norm = manhattan; distance = aitchison” for both keygrams.
Need to work out my findings
Things to consider: Compare above to top track of years 2016 (or other/more years).
Unavailable: Loudness, Key, Mode, Tempo, Time Signature
On the left, you’ll observe a plotted histogram containing the tempo of every song per two years. The average tempo of every time period are similar to each other, with a range between 125 and 135 bpm. A minor outlier is 2015-2016, which comes around 134 bpm. This correlates to 2016 having the highest energy feature (see A3).
Tempograms
For the tempogram analysis, a typical post-punk (“Unconscious Melody”) and an atypical math-rock (“Setsuyakuka”) song were selected to compare its rhythmic differences. The tempo of Unconscious Melody is mostly constant around 220 bpm, with an occasional switch towards 470 bpm.
The tempo of the song “Setsuyakuka” is, however, less constant. Although the tempo is relatively pronounced around 380 bpm in the first half, it goes all over the place in the second. Especially between 150-220 seconds seems to be very diverse compared to the rest of the song. Listening to the song, it could be possible that the tempogram had a less hard time calculating the tempo, as that section sounds less ‘noisy’ than the other parts.
[Draft]
Some observations
KNN matrix:
- Periods 2011-2012, 2013-2014, and 2019-2020 are most distinctive compared to others.
- 2017-2018 has a minor overlap with 2015-2016. In the context of my corpus, it could mean that my listening habits were similar between those years.
Random forest tree:
- Timbre is important (c01, c03, and c11).
- Instrumentalness is also an important factor. Could be due to math rock (usually without vocals in East Asia) and ambient music, of which both genres became prominent on my last.fm starting from 2015-2016.
Table 1. Accuracy of random forest model:
| Group | Precision | Recall |
|---|---|---|
| 2011-2012 | 0.444 | 0.467 |
| 2013-2014 | 0.824 | 0.875 |
| 2015-2016 | 0.330 | 0.300 |
| 2017-2018 | 0.353 | 0.342 |
| 2019-2020 | 0.484 | 0.492 |
To do: Resize figures. Make facetted plots of top features from the random forest. Using only the top 10 most influential features and replot everything.
TODO
Relevant link: https://www.theverge.com/2018/2/12/17003076/spotify-data-shows-songs-teens-adult-taste-music